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Abstract 

In this paper we discuss improved estimators for the regression and the disper- 
sion parameters in an extended class of dispersion models (J0rgensen, 1996). This 
class extends the regular dispersion models by letting the dispersion parameter 
vary throughout the observations, and contains the dispersion models as particular 
case. General formulae for the second-order bias are obtained explicitly in disper- 
sion models with dispersion covariates, which generalize previous results by Botter 
and Cordeiro (1998), Cordeiro and McCullagh (1991), Cordeiro and Vasconcellos 
(1999), and Paula (1992). The practical use of the formulae is that we can derive 
closed-form expressions for the second-order biases of the maximum likelihood es- 
timators of the regression and dispersion parameters when the information matrix 
has a closed-form. Various expressions for the second-order biases are given for 
special models. The formulae have advantages for numerical purposes because they 
require only a supplementary weighted linear regression. We also compare these 
bias-corrected estimators with two different estimators which are also bias-free to 
the second-order that are based on bootstrap methods. These estimators are com- 
pared by simulation. 

Keywords: Dispersion models; Dispersion Covariates; Nonlinear Models; Bias Cor- 
rection 



1 Introduction 



The class of dispersion models was introduced by J0rgensen (1997a) and represents a 
collection of probability density functions which has as particular cases, the proper dis- 
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persion models also introduced by J0rgensen (1997b), and the well-known one parameter 
exponential family. It is possible to introduce a regression structure, and that is what 
will be done in this work. We also allow a regression structure on the dispersion param- 
eter. Thus, this regression structure generalizes the exponential family nonlinear models 
(Cordeiro and Paula, 1989), the generalized linear models with dispersion covariates (see, 
for instance, Botter and Cordeiro, 1998), and the generalized linear models (McCullagh 
and Nelder, 1989). We will call from now on, the dispersion model together with its 
regression structure simply by dispersion model. Recently, Simas et al. (2009b) studied 
asymptotic tail properties of distributions in the class of dispersion models. 

Few attempts have been made to develop second-order asymptotic theory for dis- 
persion models in order to have better likelihood inference procedures. An asymptotic 
formula of order n -1 / 2 , where n is the sample size, for the skewness of the distribution 
of P in dispersion models was obtained by Simas et al. (2009c). Moreover, Rocha et al. 
(2009) obtained a matrix expression for the covariance matrix up to the second-order for 
dispersion models with this regression structure. 

The problem of modeling variances has been largely discussed in the statistical lit- 
erature particularly in the econometric area (see, for instance, Harvey, 1976). Under 
normal errors, Atkinson (1985) present some graphical methods to detect heteroscedas- 
ticity. Moving away from normal errors, Smyth (1989) describes a method which allows 
modeling the dispersion parameter in some generalized linear models. 

It is known that maximum likelihood estimators (MLEs) in nonlinear regression mod- 
els are generally biased. These bias become problematic when the study is being done in 
small samples. Bias does not pose a serious problem when the sample size n is large, since 
its order is typically 0(n~ r ), whereas the asymptotic standard error has order 0(n~~ 1 / 2 ). 
Several authors have explored bias in regression models. Pike et al. (1979) investigated 
the magnitude of the bias in unconditional estimates from logistic linear models, when 
the number of strata is large. Cordeiro and McCullagh (1991) gave a general bias formu- 
lae in matrix notation for generalized linear models. Furthermore, Simas et al. (2009a) 
obtained matrix expressions for the second-order bias of the MLEs in a general beta 
regression model. 

The method used to obtain expressions of the 0(n^ v ) bias of the parameters of this 
class of dispersion models is the one given by Cox and Snell (1968). It is also possible to 
perform bias adjustment using the estimated bias from a bootstrap resampling scheme, 
which requires no explicit derivation of the bias function. 

The chief goal of this paper is to obtain closed-form expressions for the second order 
biases of the MLEs of the parameters, of the means of the responses, and of the precision 
parameters of the model. The results are used to define bias corrected estimators to order 
0(n _1 ). We also consider bootstrap bias adjustment. 

The rest of this paper unfolds as follows. In Section 2, we introduce the class of 
dispersion models with dispersion covariates along with the score function and Fisher's 
information matrix. In Section 3, we derive a matrix expression for the second order biases 
of the MLEs of the parameters, and consider analytical and bootstrap bias correction 
schemes. We also show how the biases of the MLEs of the parameters can be easily 
computed by means of auxiliary weighted linear regressions. In Section 4, we obtain the 
second order biases of the MLEs of the means of the responses and precision parameters of 
the model. In Section 5, we consider some special cases in detail. In Section 6, we present 
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simulation results that show that the proposed estimators have better performance in 
small samples, in terms of bias, than the original MLEs. Finally, the paper is concluded 
in Section 7 with some final remarks. In the Appendix we give explicit expressions for 
the quantities needed to calculate the 0(n~ l ) bias of the MLEs of the parameters. 

2 Dispersion models with dispersion covariates 

Let the random variables Yj., . . . ,Y n be independent with each Yj having a probability 
density function of the form 

KivWiAi) = exp{0%,//i) + a(<f>i,y)}, y e R, (1) 

where a(-, ■) and £(-, ■) are given functions, <fi > and fi varies in an interval of the line. 
Exponential dispersion models are a special case of (PQ), obtained by taking t(y,fi) = 
9y — b(9), where /i = b'{9). Proper dispersion models are also a special case of (pQ), 
obtained by taking a(4>,y) = d\((j)) + ^(y), where d\{-) and c^-) are known functions. 
If Y is continuous, tt(-) is assumed to be a density with respect to Lebesgue measure, 
while if Y is discrete 7r(-) is assumed to be a density with respect to counting measure. 
We call (p the precision parameter and a 2 = <p~ l the dispersion parameter. Similarly, the 
parameter \i may generally be interpreted as a kind of location parameter, but fi is not 
generally the expectation of the distribution. 

In order to introduce a regression structure in the class of models ([1]), we assume that 

gi(lM) = Vu = h{%i',0) and 02 (<&) = V2i = h{zj\0), i = l,...,n, (2) 

where Xi = (xn, . . . , Xi mi ) T and = (z^, . . . , z im2 ) are mi and m 2 -vectors of nonstochas- 
tic independent variables associated with the ith response which need not to be exclusive, 
(3 = (/?!,..., (3 p ) T is a p- vector of unknown parameters, 6 = . . . , 6 q ) T is a g- vector 
of unknown parameters, <?i(-) and ^(O are strictly monotonic and twice continuously 
differentiable and are usually referred to as link functions, •) and •) are, possibly 
nonlinear, twice continuously differentiable functions with respect to (3 and 9, respec- 
tively. The regression parameters /3 and 9 are assumed to be functionally independent. 
The regression structures link the covariates and Zj to the parameters of interest /ij 
and (f>i, respectively, where /ij, as described above, is not necessarily the mean of Yj. The 
nxp matrix of derivatives of 771 with respect to (3 is denoted by X = X(f3) = drji/d(3, and 
the n x q matrix of derivatives of 7/2 with respect to 9 is denoted by Z = Z{9) = dr]2/d9, 
and these matrices are assumed to have ranks p and q for all /3 and all 9, respectively. It 
is also assumed that the usual regularity conditions for maximum likelihood estimation 
and large sample inference hold; see Cox and Hinkley (1974, Chapter 9). 

Consider a random sample yi, . . . ,y n from (pQ). The log-likelihood function for this 
class of dispersion models with dispersion covariates has the form 

n 

£((3,9) = ^{^,^) + a(0 i)2/i )}, (3) 

i=l 

Hi = g^irju), 4>i = 5 , 2 _1 ( ? 72j), as defined in ([2]), are functions of (3 and 9, respectively. 
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The components of the score vector, obtained by differentiation of the log-likelihood 
function with respect to the parameters, are given, for r = 1, . . . ,p, as 

tt(ro\ d£ (P> 9 ) STa+>( x d Vi d Vu 1 
Ur{/3,0) = -— — = y^fat(yi,fa)- — -7--, r=l,...,p, 

oPr drju d/3 r 

where t'(y h fa) = dt(y u fa)/dfa, and for R = 1, . . . , q 

U R {/3,9) = -tt — = > + a (fa,yi)}-, — R = l,...,q, 

ov R ai]2i ov R 

where a'(fa,yi) = da(fa,yi)/dfa. Further, the regularity conditions implies that 

E(t'( yi ,fa)) = and E (t(yi, fa)) — —E (a! (fa, yi)) . 

Let d ri = E (d r t(yi, fa) /dul) and a ri = E (d r a(fa, y^/dfal), note that d\ = 0, d = 
further, let t* = (t'(y 1 ,fa), t'(y n , fa % )) T , v = t(y h fa) + a' (fa, y t ), also, define the 
matrix T\ = dia,g(dfa/dr)u), T 2 = di&g(dfa/dr)2i), $ = diag(fa), with diag(fa) denoting 
the n x n diagonal matrix with typical element fa, i — 1, . . . , n. Therefore, we can write 
the (p + q) x 1 dimensional score vector U(() in the form (Up((3,9) J \Ue(/3,9) T ) T ', with 

u (p,9) = x T mt*, 

Ue(J3,6) = Z T T 2 v. [V 

The MLEs of (5 and 9 are obtained as the solution of the nonlinear system U(() = 0. 
In practice, the MLEs can be obtained through a numerical maximization of the log- 
likelihood function using a nonlinear optimization algorithm, e.g., BFGS. For details, see 
Press et al. (1992). 

It is possible to obtain Fisher's information matrix for the parameter vector ( = 
(f3 T , 9 T ) T as 

K e (C) 

where, K P (Q = X T <$>W P X, K e (() = Z T W e Z, W p = diag(-d 2i (dfa/drj u ) 2 ) and W e = 
diag (—a2i(d4>i/di]2i) 2 ). Further, note that the parameters /3 and 9 are globally orthogonal 
(Cox and Reid, 1987). Furthermore, the MLEs ( and K(() are consistent estimators 
of ( and K((), respectively, where K(() is the Fisher's information matrix evaluated 
at (. Assuming that J(() = ]]m n ^ oo K(0/n exists and is nonsingular, we have that 

V™ (( — C ) ~> Np+ q (0, J(C) _1 ), where, -4- denotes convergence in distribution. Hence, if 

Cr denotes the rth component of (, it follows that 



K(C) 



(c-c) {K(crr i/2 ^N(o,i), 



where K(() is the rth diagonal element of K(() 1 . Then, if < a < 1/2, and g 7 
represents the 7 quantile of the N(0,1) distribution, we have, for r = l,...,p, (3 r ± 

/ » \l/2 „ / A \l/2 

Qi-a/2 [Ko(C) RR ) as the limits of asymptotic confidence 
intervals for /3 r and 9r, respectively, both with asymptotic coverage of 100(1 — a)%, 
where Kp(Q rr is the rth diagonal element of K^Q^ 1 and Kg(Q RR is the -Rth diagonal 
element of Kg^) -1 . The asymptotic variances of /3 r and 9 R are estimated by Kp(() rr and 
Kq(C) rr , respectively. 
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3 Bias correction of the MLEs of j3 and 6 



We begin by obtaining an expression for the second order biases of the MLEs of (3 and 9 
in this class of dispersion models with dispersion covariates using Cox and Snell's (1968) 
general formula. With this expression we will be able to obtain bias corrected estimates 
of the unknown parameters. 

We now introduce the following total log-likelihood derivatives in which we reserve 
lower-case subscripts r, s,t,u, . . . to denote components of the /3 vector and upper-case 
subscripts R, S,T,U, . . . for components of the 9 vector: U r = d£/d/3 r , U rS = d 2 £/d/3 r 9 s , 
U rs T = d 3 £/d/3 r d/3 s d9 T , and so on. The standard notation will be adopted for the mo- 
ments of the log-likelihood derivatives: n rs = E(U rs ), re rjS = E(U r U s ), K rSt T = E{U ts Ut), 
etc., where all re's to a total over sample and are, in general, of order 0(n). We define the 

(t) (T) 

derivatives of the moments by k T s = dK rs /d(3t, K rs = dK rs /d9T, etc. Not all the re's are 
functionally independent. For example, re rSjt = K rst — ref*i gives the covariance between 
the first derivative of £(/3, 9) with respect to f3 t and the mixed second derivative with 
respect to f3 r ,/3 s . Further, let re r ' s = — re rs , k r ' s = —k Rs , re r ' 5 = — n rS and k r ' s = —k rs 
be typical elements of If the inverse of the Fisher's information matrix, which are 
0(n- 1 ). 

Let B(f3 a ) and B{6 a) be the 0{n" 1 ) bias of the MLEs for the ath component of the 
parameter vector (3 and the Ath component of the parameter vector 9, respectively. From 
the general expression for the multiparameter O^" 1 ) biases of the MLEs given by Cox 
and Snell (1968), and from the global orthogonality of the parameter (see details in the 
Appendix), we can write 



(5) 



and 

B0 A ) = J2 * AR K SU {"S - \"*su\ ~ \ E K AR K SU *Rsu. (6) 

R,S,U ^ > R,s,u 

These terms together with the cumulants needed to obtain them are given in the Ap- 
pendix. After some tedious algebra, we arrive at the following expression, in matrix form, 
for the second order bias of (3: 

B0) = K^XT^MxZp - ^K P X T ^W P E1, 

where K 13 = K^ 1 = (X T <frWpX)~ l , 1 is an n x 1 vector of ones, Zp is the n x 1 
dimensional vector containing the diagonal elements of X T 'K^X, Wp was defined in 
Section 2, E = diag (tr(XiK /3 )^ , Xi is a p x p matrix with elements d 2 r] li /df3 r df3 s , and 

Let up = W^MxZp, thus the 0{n^ v ) bias of (3 can be written as 

B0) = (X T <$>WpX)- l X T <$>Wp(u p - (1/2)^1). (8) 
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Therefore, the 0(n^ 1 ) bias of /3 §E§ is easily obtained as the vector of regression coefficients 
in the formal weighted linear regression of £p = cop — (1/2) El on the columns of X with 
as weight matrix. 

The 0(n~ l ) bias (jSj) is expressed as the sum of two quantities: (i) B 1 = 
(X T $WpX)- 1 X T $WpU P , the bias for the MLE of the parameter j3 on a linear dispersion 
regression with dispersion covariates with model matrix X and Z, and thus generalizes, 
for instance, the expressions obtained by Cordeiro and McCullagh (1991), and (ii) an 
additional quantity B 2 = -(l/2)(X T §WpX)~ l X T §WpEl due to the nonlinearity of 
the function f\(xi;(3), and which vanishes if f\ is linear with respect to j3, further, this 
expression generalizes, for instance, the expression obtained by Paula (1992). 

Moving to the bias of 9, we have, after a tedious algebra on flU]), the following expression 
for the 0(n _1 ) bias of 9: 

B(9) = K e Z T {M 2 Z e - M 3 Zp} - -K 6 Z T W g Fl, 

where = K^ 1 = (Z t WqZ)~ x , Zq is the n X 1 dimensional vector containing the 
diagonal elements of Z T K 9 Z, Wg was defined in Section 2, F = diag (tT(ZiK e )\ Zi is a 
q x q matrix with elements d 2 rj 2i /d9 R 9 s , 

M 2 = diag (l {(2c4 - a 3l ) + « 2 ^ff }) , ^ 

Ms = diag (f (fe) 2 ft). 

Let now, cog = W ( / 1 {M 2 Zg — M^Z^}, then, we can express the 0(n~ l ) bias of 9 as 

B(9) = (Z T W Z)- l Z T W e (ujg - (1/2)F1). (10) 

Thus, analogously to the 0(n~ l ) bias of (3, the 0(n^ v ) bias of 9 can be obtained 
as the vector of regression coefficients in the formal weighted linear regression of = 
Ug — (1/2) Fl on the columns of Z with W$ as weight matrix. 

Again, the (9(n _1 ) bias ffTOl) is expressed as the sum of two quantities: (i) Qi = 
(Z T WgZ)~ 1 Z T WeUe, the bias of the parameter 9 for a linear dispersion regression with 
dispersion covariates with model matrices X and Z, which generalizes the results obtained 
by Botter and Cordeiro (1998), and (ii) Q 2 = -(l/2)(Z T W e Z)- 1 Z T W e Fl that is due to 
the nonlinearity of the functions f\(xi] (5) and f 2 (zf, 9), and which vanishes if both fx and 
f 2 are linear in (3 and 9, respectively. 

Now, let B(() = (B((3) T , B(9) T ) T , we can then define our first bias-corrected estimator 
C as 

C = c-B((), 

where B(Q denotes the MLE of B(Q, that is, the unknown parameters are replaced 
by their MLEs. Since the bias B(() is of order (9(n _1 ), it is not difficult to show that 

the asymptotic normality y/n — ( j -4 N p+q (0, J _1 (C)) still holds, where, as before, 

we assume that J(() = lim^oo K(()/n exists and is nonsingular. From the asymptotic 

r ~ 1 1/2 

normality of (, we have that ( a ± qi- a /2 \ K(C,) aa > , for a — 1, . . . ,p, p + 1, . . . ,p + q. 
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The asymptotic variance of ( a is estimated by K(() aa , where K(() aa is the ath diagonal 
element of the inverse of the Fisher's information matrix evaluated at (. 

The last approach we consider here, to bias-correcting MLEs of the regression parame- 
ters is based upon the numerical estimation of the bias through the bootstrap resampling 
scheme introduced by Efron (1979). Let y = (yi, . . . , y n ) T be a random sample of size n, 
where each element is a random draw from the random variable Y which has the distri- 
bution function F = F((). Here, ( is the parameter that indexes the distribution, and is 
viewed as a functional of F, i.e., ( = t(F). Finally, let ( be an estimator of ( based on 
y; we write ( = s(y). 

The application of the bootstrap method consists in obtaining, from the original 
sample y, a large number of pseudo-samples y* = (y*, . . . , y*) T , and then extracting in- 
formation from these samples to improve inference. Bootstrap methods can be classified 
into two classes, depending on how the sampling is performed: parametric and nonpara- 
metric. In the parametric version, the bootstrap samples are obtained from F(Q, which 
we shall denote as F^, whereas in the nonparametric version they are obtained from the 

empirical distribution function F, through sampling with replacement. Note that the 
nonparametric bootstrap does not entail parametric assumptions. 
Let B F ((, () be the bias of the estimator ( = s(y), that is, 

B F (CC) = E F [( - Q = E F [s(y)} -t(F), 

where the subscript F indicates that expectation is taken with respect to F. The boot- 
strap estimators of the bias in the parametric and nonparametric versions are obtained 
by replacing the true distribution F, which generated the original sample, with and 

F, respectively, in the above expression. Therefore, the parametric and nonparametric 
estimates of the bias are given, respectively, by 

B Ft (&0 = E F( [ 8 (y)]-t(Fz) and B F ((, () = E F [s(y)} - t(F). 

If B bootstrap samples (y* 1 , y* 2 , . . . , y* B ) are generated independently from the original 
sample y, and the respective boostrap replications (C* 1 , (* 2 , • • • , (* B ) are calculated, where 
s — s {y* b ), b = 1, 2, . . . , B, then it is possible to approximate the bootstrap expectations 
E^.[s(y)] and E^,[s(y)] by the mean = J2b=i C* 6 - Therefore, the bootstrap bias 

estimates based on B replications of ( are 

^(C,C) = C* (0 -s(y) and B p ((,() = C { - ) -s(y), (11) 

for the parametric and nonparametric versions, respectively. 

By using the two bootstrap bias estimates presented above, we arrive at the following 
two bias-corrected, to order (^(n^ 1 ), estimators: 

Cx = S (y)-5 F .(C,C) = 2C-C ( - ) , 
C 2 = S (y)-^(C,C) = 2C-C { - ) . 

The corrected estimates Ci and C2 were called constant-bias-correcting (CBC) estimates 
by MacKinnon and Smith (1998). 
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Since we are dealing with regression models and not with a random sample we need 
some minor modifications to the algorithm given above. 

For the nonparametric case, assume we want to fit a regression model with response 
variable y and predictors x\, . . . , x qi , Z\, . . . , z q2 . We have a sample of n observations pj = 
(yi, Xn, . . . , x igi , zn, . . . , z iq2 ), i = 1, . . . , n. Thus we use the nonparametric bootstrap 
method described above to obtain B bootstrap samples of the pj, fit the model and save 
the coefficients from each bootstrap sample. We can then obtain bias corrected estimates 
for the regression coefficients using the methods described above. This is the so-called 
Random- x resampling. 

For the parametric case, assume we have the same model as for the nonparametric 
case, we thus obtain the estimates /tj and fa (such as in our case where the distribution 
is indexed by fi and <ft) and using the parametric method described above, we obtain 
B bootstrap samples for y~i from the distribution F(p,i,fa), i = l,...,n. We would 
then regress each set of bootstrapped values yl on the covariates Xi, . . . ,x qi , Zi, . . . , z q2 
to obtain bootstrap replications of the regression coefficients. We can, again, obtain bias 
corrected estimates for the regression coefficients using the methods described above. 
This method is called Fixed-x resampling. 

4 Bias correction of the MLEs of \i and </> 

In this Section we obtain the results that are the most valuable to the practioners, namely, 
the 0{n~ l ) bias of \i and of 0, since, for practioners, the interest in a data analysis relies 
on sharp estimates of the responses and of the precision parameters. The fact that 
these results must be computed apart comes from the fact that if (3 and 9 are bias- 
free estimators, to order it is not true, in general, that fa = 1 (fi(x i ; (3)) and 
fa — Q^ihi 2 ^ ^)) wm a l so be bias-free to order Oin^ 1 ). Nevertheless, for practioners, it 
is even more important to correct the means of the responses and the precision parameters 
than correcting the regression parameters. 

We shall first obtain the 0{n~ 1 ) bias of the MLEs of r}\ and r] 2 . Using fl2]) we find, by 
Taylor expansion, that to order 0{n^ 1 ): 



where V p{r}u) is a p x 1 vector with the derivatives dr)ii/d(3 r , V$(r)2i) is a q x 1 vector 
with the derivatives drj2i/d9n. 

Thus, taking expectations on both sides of the above expression yields to this order 



f 1 (xJJ)-f 1 (xJ;f3)=V?( Vll ) T 



and 



f 2 (zf;9)-f 2 (zf;9) = V ( V2i ) T 



9) + -{9-9) T U§-9) 




1 



and 



B{jk) = ZB{9) + If, 
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where, E and F were defined in Section 3, and we used the fact that K 13 and K d are the 
asymptotic covariance matrices of $ and 9, respectively. 
^From similar calculations we obtain to order C^n -1 ) 

B{p) = B(fin)^- + ^Var(^)^ 
drjn 2 dr]f { 

and 

B( * HB(W * + ly artei) gi. 

Let T\ and T 2 be as in Section 2, further, let Si = diag^Lii/dr/^) and S 2 = 
diag(d 2 <f) i /dr] 2i ). Then, we can write the above expressions in matrix notation as 

B(Ji) = ±Ti(2XB0) + E) + Istfp (12) 

and 

B{$) = ^T 2 (2ZB(9) + F)+ l -S 2 Z e , (13) 

where Zp and Zg were defined in Section 3, and the asymptotic covariance matrices of rji 
and rj 2 are XK^X T and ZK 6 Z T , respectively. 

If we combine (I12p and (1131) with ([5]) and (1101) . we will have the following explicit 
expressions for the 0(n~ l ) biases of fi and 0, respectively: 

B x (fi) = l -Ti{2XK^X T ^Wp{up - (1/2)£1) + E) + 
#i(0) = ^T 2 (2Zi^Z T WV^ - (1/2)F1) + F) + l -S 2 Z 6 . 



and 



Lastly, we can use the bootstrap-based 0{n r ) biases to define, bias corrected esti- 
mators of fx and to this order. Then, let Bp n {p) be the vector formed by the first p 



elements of the vector B F .((, £) defined in equation ffTTT) . Bp (9) be the vector formed 
by the last q elements of the vector B and define Bp{p) and B F (9) analogously 
from the vector B F ((,() a l so i n equation (II ip . Thus, we have the following alternative 
expressions for the C(n _1 ) biases of fi and 0, respectively: 

B 2 (p = ^(2X^.0) + F) + Is-iP^ and P 3 (/i) = ^(2X^0) + F) + ^Ppp, 
and 

B 2 {& = ^T 2 (2ZB F( (9) + G) + is 2 P ee and P 3 (0) = ^T 2 (2ZB P (9) + G) + ^S 2 P ee . 

Therefore, we are now able to define the following second-order bias-corrected esti- 
mators for ft and 0: 

p. = fa- B x (p), 1h=fr — B2(P) and ^2 = A - ^3 (A) 

and 

= 0-1^(0), 1 = 0-P 2 (0) and 2 = 0-P 3 (0), 

where, for j = 1,2 and 3, -&,(•) denotes the MLE of Bj(-), that is, the unknown parameters 
are replaced by their MLEs. 
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5 Some special cases 



In this section we examine some special cases of the formula . Some other important 
cases could also be easily obtained because of the advantage of this formula that involves 
only simple operations on suitably defined matrices and can be easily implemented in 
statistical packages or in a computer algebra system such as Mathematica or Maple. 

Table [T] below shows the most common link functions and the quantities needed in 
order to compute the biases of the MLEs of the parameters /3 and 9. In Table [TJ $(•) 
denotes the standard normal distribution function; f(x) = \ j\phx exp{ — l/2x 2 } is the 
density of a standard normal distribution; and f'(x) = — x/\/2tt exp{— l/2x 2 } is the 
derivative of the density of a standard normal distribution. 

Table 1: Most common link functions. 



Link 


Formula 


dfi/dr] 


d 2 [ijdr] 2 


Logit 


log ( M /(l- 


m)) = V 




/x(l-/x)(l-2 M ) 


Probit 




= 7] 






Log 


log(/z) 


= V 






Identity 


n = 


V 


1 





Reciprocal 


^ = 


-- V 




2fi 3 


Square reciprocal 




-- V 


-/i 3 /2 


3/i 5 /4 


Square Root 


v / / I = 


V 


2^ 


2 


C-loglog 


log(-log(l - 


- fj)) = V 


-log(l-^)(l- M ) 


-(l-/x) log(l -n)x 










x(l + log(l-/i)) 


Tangent 


tan(/i) 


= V 


cos(/x) 2 


2 cos(/x) 3 sin(//) 



5.1 Generalized linear models with dispersion covariates 

The results obtained in this subsection generalize the results obtained in the articles by 
Cordeiro and McCullagh (1991) and Botter and Cordeiro (1998). 

We begin by analysing the 0{n^ 1 ) bias of the parameter (3. Here, the function £(•, •) 
has the form t(y, 6) = y9 — b(9), where b'{6) = fi. Thus, consider the function r{6) = b'(9), 
r(-) is called the mean value mapping, the variance function is related to the mean value 
mapping by dr^^/dfi = V(/i) -1 . We have that t{y, r _1 (/i)} = yr _1 (/i) — 6{T _1 (/i)}. 
For generalized linear models, d 2 = —V^ 1 and d 3 = 2V~ 2 V^ l \ where = dV(fj,)/dfi. 
Thus, the matrix W reduces to W = {V~ 1 (dfi/drj) 2 }. The local model matrix X also 
reduces to the matrix X from h(fii) = r/i = xf(3 and E vanishes. Further, we have that 

which is precisely the result obtained by Cordeiro and McCullagh (1991). 
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Table [2] shows the distributions in the exponential family, along with the quantities 
needed to obtain the bias. 

We now move to the bias for the dispersion parameter 9. So, let's consider the 
two-parameter full exponential family distributions with canonical parameters and 4>d. 
Therefore, we have a(<p,y) = 4>c(y) + ai((f>) + a 2 (y), where c(-) is a known appropriate 



function. Then it turns out that «2 
have that 



a'(((j)) and a 3 = a' 2 = a'" ((/)). Then, using ([9]), we 



Mo 




<(0) 



drj 2 



d 2 



dr] 2 drfe 



and 



1 



diag^--^" 1 



d\i 
drj! 



drj 2 



The expressions above agrees with the formula presented by Botter and Cordeiro 
(1998). 

Table |3] presents the values of the derivatives of the function a± for the distributions 
in the exponential family. In Table [3J ■0' m -'(-), m = 0, 1, ... , is the poly gamma function 
defined by ^ m \x) = (d m+1 /dx m+1 ) log T(x), x > 0. 



5.2 Exponential family nonlinear models with dispersion covari- 
ates 

This model generalizes the generalized linear model with dispersion covariates. Recently 
Simas and Cordeiro (2009) provided ajusted Pearson residuals for exponential family 
nonlinear models. We only have the 0(n~ l ) bias computed in the literature for the ex- 
ponential family nonlinear models with constant dispersion parameter (see Paula, 1991). 
The results for the exponential family nonlinear model with disperion covariates are new. 



Table 2: Exponential Family 



Distribution 


V 


VW 




Normal 


1 








Poisson 




1 





Binomial 




1 - 2(i 


-2 


Gamma 




2fx 


2 


Inv. Gauss. 


M 3 


3^ 2 


6/i 



Table 3: Exponential Family 



Distribution 


ai(0) 


<(0) 


<(0) 


Normal 


log y/4> 


1 

2<j> 2 


J_ 


Gamma 


01og(0) - \ogT<p 


\ + 


-£ + W) 


Inv. Gauss. 


log v 7 ^ 


1 

20 2 


03 
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Let us consider the same parameterization from above, i.e., t{y, r~ 1 ( y u)} = yr _1 (/x) — 
6{r _1 (/i)}, with d,T~ 1 (fi)/dfi = V(/i) _1 . Then, the matrices M\,M2 and M3 are the same 
as the ones computed in the previous subsection 

We now present in Table H] the results for two distributions that belong to the class 
of exponential dispersion models introduced by J0rgensen (1987). 



Table 4: Exponential dispersion models. 



Distribution 



do 



d' 



d?, 



GHS 


2 


-8fi 


(2^+1011) 


(m 2 +i) 2 


(m 2 +i) 3 


(/x 2 +l) 3 


Neg. Bin. 
Power Var. 


1 i_ _ 


1 1 


2 2 




(1- M )2 


(1+^)2 M 2 


Exp. Var. 









Among these distributions are the generalized hyperbolic secant and the negative 
binomial. Our results can be applied for a very rich class of models discussed in detail 
in J0rgensen's (1997b) book. He presented several exponential dispersion models in (0Q) 
including the Tweedie class of distributions with power variance function defined by 
taking V(fi) = // and the cumulant generator function bs(9) for 8 ^ 1, 2 by 

b 5 (6) = (2-5)- 1 {(l-5)6}^, 

and bi{6) = exp(#) and b 2 (9) = — log(— 9). We recognize for 5 = 0, 2 and 3, the cumu- 
lant generator corresponding to the normal, gamma and inverse Gaussian distributions, 
respectively. There exist continuous exponential dispersion models generated by extreme 
stable distributions with support M. and positive stable distributions, respectively, when 
5 < and 5 > 2 and compound Poisson distributions for 1 < 5 < 2. We also would like 
to remark that there exists an exponential dispersion model with exponential variance 
function, V(fi) = e M , for more details see the book of Jorgensen (1997b). 

Finally, it is noteworthy that this special case has not been treated in the literature 
until now. 



5.3 Proper dispersion models with dispersion covariates 



For proper dispersion models, the formula ([Zj) have no reduction, since the only difference 
of a proper dispersion model from a dispersion model is the form of the function a(-, •) 
which can be decomposed into a(<p, y) = a 1 (<f)) + a2(y). We will now give the expression for 
the matrices M 2 and M 3 . First we note that for this case «2 = o!{{4>) and = ct' 2 = a'" ((f)). 
Then, using <M), we have that 



M 2 = diag 



<(0) 



drji 



+ <(0) 



d 2 



dr}2 dr\\ 



and 



M 3 = diag 



djj, 
drj! 



dn 2 
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Table 5: Proper Dispersion Models 



Distribution 


d 2 


4 


d 3 


Rec. Gamma 




2/i" 3 




Log-Gamma 


-1 





1 


Rec. Inv. Gauss. 









Von-Mises 


-r(<f>) 









Note that even though the form of a((f>, y) for this case is different from the form of a((p, y) 
for the two-parameter full exponential family model, the expressions for M 2 and M 3 are 
equal. 

But to illustrate the idea on a particular example of proper dispersion model, we will 
consider the von Mises regression model. Then, we now move to von Mises regression 
models which are quite useful for modelling circular data; see Fisher (1993) and Mardia 
(1972). Here, the density is given by 

n{y,V,<f)) = * exp{0cos(y -//)}, (14) 
2nl ((f)) 

where, — n < y < ir, — 7r < ji < ir, <p > 0, and I v denotes the modified Bessel function of 
the first kind and order v (see Abramowitz and Stegun, 1970, Eq. 9.6.1). The density 
in (fT4j) is symmetric around y = fi which is the mode and the circular mean of the 
distribution. <p is a precision parameter in the sense that the larger the value of (f> the 
more concentrated the density around [i gets. It is clear that the density (TH|) is a proper 
dispersion model, since t(y,fi) = cos(y — fi) and ai(0) = log{/ o (0)}- We now begin 
by investigating the skewness for the parameters /3. Then, it is possible to show that 
£{sin(F - fi)} = and E[{cos(Y - ^)} 2 ] = 1 - 0~V(0), where r(<j>) = h{<f>) / I Q {<f>) , 
these results yield d 2 = —r((f>), d 3 = and d' 2 = 0. Further, we have that the matrix 
W = diag{(d/i/dr/) 2 r(0)}. 

Note initially that Iq(4>) = h{4>) and I{(4>) = Io((p)—Ii(4>)/(f) (Abramowitz and Stegun, 
1970; equations 9.6.26 and 9.6.27). Then, a" ((f)) = r'((p) and a'{'((f)) = r"((f>), where, as 
above, r(4>) = h ((f)) /I Q ((f)). 



We have that 

M 2 

and 




M 3 = diag 



r((f>) d\i d 2 fi 
drji dr/j 

r'((f)) d(j) d 2 (f) 
2 dr] 2 dt]2 

r((f) f dfi\ 2 d(f) 
dr]i) df] 2 



We provide in Tables |5] the quantities needed for several distributions in the class of 
proper dispersion models. 

In Table |6] we give the derivatives of the function a\ for several distributions in the 
class of proper dispersion models. 
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Table 6: Proper Dispersion Models 



Distribution 


ai(0) 




<(0) 


<(0) 


Rec. Gamma 


01og(0) - log 


,T0 




+ <(</>) 


Log- Gamma 


01og(0) - log 




| + W) 


-£ + 


Rec. Inv. Gauss. 


logv/0 




_ 202 


03 


Von-Mises 


log J (0) 




r'O) 


r"(0) 



5.4 Some other special cases 

We now investigate some special cases which were first studied by Cordeiro (1983). If we 
take t(y, 9) = yfi-b(fi), ([I]) is a one parameter exponential family indexed by the canonical 
parameter fi. Now, if in ([1]) we assume that t(y, /i) involves a known constant parameter 
c for all observations, t(y,fi) = t(y,/i, c) say, and that 0=1 and a((p,y) = a(c,y). 
For doing this this, several models can be defined within the present framework: normal 
distribution N(fi, c 2 fi 2 ), lognormal LN(fi, c 2 /i 2 ), inverse Gaussian distribution IG(fi, c 2 fj, 2 ) 
with mean /i and known constant coefficient of variation c, Weibull distribution W(fi, c) 
with mean fi and known constant shape parameter c. Here the normal and inverse 
Gaussian distribtuions are not standard generalized linear models since we are considering 
a different parameterization. 

For these models, we have that d 2 = —fan~ 2 , d 3 = fafi~ 3 and d' 2 = 2fan~ 3 , where 
fa and fa are known positive functions of c (see Table [7]). Then, we have the matrix 
W = dia.g{k2^~ 2 (dfi/dr]) 2 }, and hence we are able to obtain the inverse of the information 
matrix, and the matrix Mi. Moreover, w = fafi~ 2 {dn/di]) 2 , and 



Mi = diag 



(Afa - fa)fi i-r-) - fan 



d[i\ _ 2 dfj, d 2 [i 



dr\\ ) drji dr]. 



Table 7: Values of fa and fa for the normal, inverse Gaussian, lognormal and Weibull 
distributions. 

Model fa(c) fa(c) 

Normal (N(fi, c 2 /z 2 )) c~ 2 (l + 2c 2 ) C - 2 (6 + 10c 2 ) 

Inverse Gaussian (IG(fi, c 2 /j 2 )) l/2c" 2 (l + c 2 ) c _2 (3 + c 2 ) 

Lognormal (LN(n, cV)) [log(l + c 2 )]" 1 3[log(l + c 2 )]" 1 

Weibull (W(fji,c)) c 2 c 2 (c + 3) 



6 Numerical Results 

In this section we present the results of some Monte Carlo simulation experiments, where 
we study the finite-sample distributions of the MLEs of (3 and 9 along with their corrected 
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versions proposed in this paper. We use a reciprocal gamma model with square root link 
and a log link in a nonlinear model for the dispersion parameter 

log^ = 9 + Oixij + x 2 2 ti , i = 1, . . . , n, 

where the true values of the parameters were taken as 0o — 1/2, 0i = 1, fa — 2 and 
#o = 1, #i = 2 and $2 = 3. Note also that here the elements of the n x 3 matrix X are: 
— lj-^(/3)i,2 — an d X(/3) i3 = log(x 2j i)x2 2 j. The explanatory variables x\ 
and X2 were generated from the uniform U(0, 1) distribution for sample size n = 20, and 
their values were held constant throughout the simulations. The number of Monte Carlo 
replications was set at 5, 000 and all simulations were performed using the statistical 
software Ft. 

In each of the 5, 000 replications, we fitted the model and computed the MLEs (3, 
9, its corrected versions from the corrective method (Cox and Snell, 1968), preventive 
method (Firth, 1993) and the bootstrap method both of its parametric and nonparametric 
versions (Efron, 1979). The number of bootstrap replications was set to 500 for both 
bootstrap methods. 

In order to analyze the results we computed, for each sample size and for each estima- 
tor, the mean of estimates, bias, variance and mean square error (MSE). Table [S] present 
simulation results. 

Lastly, in each replication we estimated the confidence interval for each parameter for 
each estimator, and verified if the true value of the parameter belonged to this estimated 
confidence interval. After that we obtained the average of the number of confidence 
intervals that contained the true parameter. In this way we were able to check if the 
estimated confidence interval was close to its nominal level of confidence. The confidence 
intervals were constructed following the strategies stated at the end of Section 2 and at 
Section 3. 

Table M presents simulation results for sample size n = 20 with respect to the pa- 
rameters (3 and 9. We begin by looking at the estimated biases, in absolute value, of 
the estimators. Initially, we note that for all parameters the biases of the corrective es- 
timators were smaller than those of the original MLEs. However, for all parameters the 
biases of the preventive estimators were larger than those of the original MLEs. More- 
over, not only the biases were larger but also the MSEs were larger as well, which shows 
that the preventive method does not work well for this model. The same phenomenon 
occurred in Ospina et al. (2006), which corroborates the idea that this method has some 
problems in beta regression models. We now observe that the MSE of the corrective 
estimators were smaller than those of the MLEs for all parameters, showing that the 
correction is effective. Moving to the bootstrap corrected-estimators, we note that the 
parametric bootstrap had the smallest MSE for all parameters, even though the biases 
were not the smallest. However, the MSEs were very close to the MSE of the corrective 
method, and the computation of the parametric bootstrap biases is computer intensive, 
whereas the corrective method is not. Lastly, we observe that for all parameters 9 the 
MSE of the nonparametric bootstrap corrected estimators were smaller than those of 
the MLEs. Moreover, for the parameters /3, the MSE of the nonparametric bootstrap 
corrected estimators were very close to those of the MLEs, showing that this method is 
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satisfactory, and is very easy to implement by practitioners since no parametric assump- 
tions are made. Therefore, for the small sample size n = 20, we were able to conclude 
that the corrective method by Cox and Snell (1969) was successfully applied, as well as 
the bootstrap corrections. 



Table 8: Simulation results. 



Parameter 


MLE 


Cox-Snell 


p-boot 


np-boot 


A) 


0.6356 


0.5552 


0.5728 


0.6001 


Bias 


0.1356 


0.0552 


0.0728 


0.1001 


Variance 


0.0716 


0.0707 


0.0683 


0.0755 


MSE 


0.0899 


0.0737 


0.0735 


0.0855 


Pi 


0.9383 


1.0220 


0.9535 


1.0519 


Bias 


-0.0617 


0.0220 


-0.0465 


0.0519 


Variance 


0.0251 


0.0224 


0.0203 


0.0261 


MSE 


0.0289 


0.0228 


0.0224 


0.0287 


02 


1.8853 


2.0075 


1.9783 


1.9099 


Bias 


-0.1147 


0.0075 


-0.0217 


-0.0901 


Variance 


0.0348 


0.0316 


0.0289 


0.0331 


MSE 


0.0479 


0.0317 


0.0293 


0.0412 





1.0531 


1.0211 


1.0248 


1.0612 


Bias 


0.0531 


0.0211 


0.0248 


0.0612 


Variance 


0.5805 


0.5332 


0.4669 


0.4841 


MSE 


0.5833 


0.5336 


0.4675 


0.4878 


0i 


2.1077 


1.9934 


1.9872 


2.1067 


Bias 


0.1077 


-0.0066 


-0.0128 


0.1067 


Variance 


0.3001 


0.2222 


0.2345 


0.2300 


MSE 


0.3117 


0.2222 


0.2347 


0.2414 


62 


3.0464 


3.0077 


3.0115 


3.0519 


Bias 


0.0464 


0.0077 


0.0115 


0.0519 


Variance 


0.1101 


0.0858 


0.0686 


0.0525 


MSE 


0.1122 


0.0858 


0.0687 


0.0551 



Table M presents the simulation results for sample size n = 20 with respect to coverage 
of the interval estimates on different nominal converages 1 — a = 90%, 95% and 99%. All 
confidence intervals were defined such that the probability that the true parameter value 
belongs to the interval is 1 — a, the probability that the true parameter value is smaller 
than the lower limit of the interval is a/2 and the probability that the value of the 
parameter is greater than the upper limit of the interval is a/2 for < a < 1/2. 

We begin by noting that the confidence intervals induced by the Firth estimates 
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had the worst coverage, and therefore are not reliable. Further, the MLE and the non- 
parametric bootstrap had a similar behavior. The best coverage is from the corrective 
method Cox-Snell, all the coverage were closer to the nominal level with the Cox-Snell 
than any other estimator. Finally the parametric bootstrap had a poor perfomance with 
respect to the coverage of the confidence interval. The reason for that, we believe, is that 
the bootstrap estimator had the smallest MSE, which was in fact, due to the fact that 
it had the smallest variance among all the other estimators as seen in Table El therefore 
the confidence intervals induced by the parametric bootstrap estimator had the smallest 
average length, which yielded this poor coverage. 



Table 9: Coverage of the interval estimates of the parameters. 



a 


Estimator 




00 




01 




02 




0o 




0i 




02 




MLE 


0. 


8275 


0. 


,7928 





,8079 





.7911 


0. 


,7245 


0. 


7811 




Cox-Snell 


0. 


8788 


0. 


.8213 


0, 


.8710 


0. 


,8482 


0. 


,7826 


0. 


8296 


10% 


p-boot 


0. 


8131 


0. 


.7897 





.7771 





.7642 


0. 


,7155 


0, 


.7721 




np-boot 


0. 


8352 


0. 


.8013 





.8239 





.7965 


0. 


,7307 


0. 


8004 




MLE 


0. 


8827 


0. 


.8417 





.8913 





.8608 


0. 


,8424 


0. 


8549 




Cox-Snell 


0. 


9279 


0. 


,8981 


0. 


.9311 


0. 


,9035 


0. 


,8745 


0. 


8894 


5% 


p-boot 


0. 


8560 


0. 


,8100 





.8475 





.8382 


0. 


,8133 


0. 


8351 




np-boot 


0. 


8846 


0. 


,8592 





.9022 





.8768 


0. 


,8488 


0. 


8673 




MLE 


0. 


9592 


0. 


,9216 





.9665 





.9439 


0. 


,9133 


0. 


9288 




Cox-Snell 


0. 


9771 


0. 


,9653 


0. 


9803 


0. 


.9608 


0. 


,9409 


0. 


9573 


1% 


p-boot 


0. 


9385 


0. 


9037 





.9318 





.9194 


0. 


,8867 


0. 


8999 




np-boot 


0. 


9678 


0. 


,9284 





.9621 





.9518 


0. 


,9175 


0. 


9334 



Finally, we would like to remark that one may build hypothesis tests upon confidence 
intervals. Further, if the confidence level of the confidence interval is 1-a, then the test 
based on this confidence interval will have significance level a. Moreover, the tests based 
on the confidence intervals used in this article are equivalent to Wald tests. Therefore, the 
hypothesis tests based on the confidence intervals would have significance levels closest 
to the nominal level when using the corrective method. 

7 Conclusion 

We defined a general dispersion model which allows a regression structure on the precision 
parameter, in such a way that the regression structures on both the mean and the precision 
parameters are allowed to be nonlinear. Then, using the approximation theory developed 
by Cox and Snell (1968), we calculate the O (n^ 1 ) bias for the MLEs for and 0. 

The dispersion models extends the well-known generalized linear models and also the 
exponential family nonlinear models. It is also important to say that is also generalizes 
the class of Proper dispersion models introduced by J0rgensen (1997a). Several properties 
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and applications of dispersion models can be found on the excellent book of J0rgensen 
(1997b). 

Our results, thus, generalize, for instance, the formulae obtained by Cordeiro and Mc- 
Cullagh (1991), Paula (1992), Cordeiro and Vasconcellos (1999) and Botter and Cordeiro 
(1998). We then defined bias-free estimators to order O (n^ 1 ), by using the expressions 
obtained through Cox and Snell's (1968) formulae. We also considered two schemes of 
bias correction based on bootstrap. 

Finally, we considered a simulation study in a nonlinear reciprocal gamma model with 
nonlinear dispersion covariates. The simulation suggested, among other things, that bias- 
corrected up to the second-order estimators should be used instead of the usual MLEs. 
Furthermore, we were able to notice that the analytical bias-corrected estimators had the 
smallest biases, whereas the bias-corrected estimators using parmetric bootstrap scheme 
had the smallest mean square error. Note that, even though the parametric bootstrap 
had the least mean square error, this fact yielded that the confidence intervals induced 
by the bootstrap estimator had the poorest coverage, mainly because its small variance 
produced confidence intervals with small length. Nevertheless, the confidence intervals 
obtained by the corrective method were the best in terms of coverage closer to the nominal 
level. 



Appendix 

We give explicit expressions for the cumulants and their derivatives, both defined in 
Section 3. Further, we give the expressions for each quantity contained in equations (0) 
and ([6]), some of them are also deduced to help the reader who might be interested in 
checking the results. 

Consider initially the following notation for the derivatives, and product of the deriva- 
tives, of the predictor with respect to the regression parameters: 



{rs)i 



and so on. Recall that 

& 



E 



diX. 



, {RS)i 



d 2 7]2i 

de R de & 



{rs,T)i 



d 2 r]u dr] 



2 1 



d(3 r d/3 s d9 7 



and av 



E 



& 



-a(4>i,Yi) 



By using these quantities, the cumulants can be written as 



K rS 
KRS 



0, 

n 



2 1 



dfM 
drjh 



i=l 



«2i 



dr] : 



2 1 



(r, s)i 



(R,S)i 
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n , 
t=l ^ 



2 



2 



{(rs, w)i + (ru, s), + (s«, r)*}, 



«rSf/ — 0, 



SMS +3a2> S^} WS ' C,) " 

+ a 2i (|^J {(AS, £/), + (fl£7, S), + (5C7, B),}. 
Differentiating the second order cumulants with respect to the parameters, we have 



drju J drju drj 



K rs 



+ y^did 2i ( J {(rw, s)i + (sM,r)i}, 
^ WW 

~ \drjxij drj2i 



K RS — 0' 

JU) 
^RS 



+ y> 2i (— ) {{RU,S) l + (SU,R) t }, 



K rS — U > 



k (u) - 



We now recall let M%, M 2 and M 3 be the diagonal matrices given in equations 
(JZJ) and ([9]). Let rriji be the ith diagonal element of the matrix Mj. Also, let 
W/3 = diag (— d2i(dfj,i/drjii) 2 ) and W# = diag (— a2i(d(f>i/drj2i) 2 ), and and w t i be the 
diagonal elements of Wp and We, respectively. We then, have that the 0(n~ r ) bias of ft, 
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B0) is 

B0 a ) = J2^ ar ^ U {^ ) -^rsu\ = ^m^^^E^M* 

r,s,u ^ ' i=l r s,u 

n 1 

- ^2 2^ iWbi E KaTKSW {( rS , U )i ~ ( rM > S )i ~ ( SM > r 
i=l r,s,u 

n 

i=l r 
n 1 

- ^ -&iu w ^ /t ar (r)i ^ K su (su)i 

i=l r s,u 

= 5 T a Y J K ^ T5 i<t>i m ^J^X T )5 i 

i=l 

n 1 

i=l 

= 5^K^X T <^M 1 Zp - ]-5 T a K p X T ^WpEl, 

where 8 a is a p x 1 vector with a one in the ath position and zeros elsewhere, and the 
matrices E, Zp, and K 13 were defined in Section 3. 
Analogously, one uses the expression 

B (6 a ) = ^ su {«£ - + £ {«s - ^4 , 

fl,s,tt ^ ' > R,S,U ^ ~ ) 

to obtain that 

fl(0 o ) = 5 T a K e Z T {M 2 Z e - M 3 Zp} - ]f a K 6 ~Z T W e F\, 

where, in this case, a = 1, . . . , q, and the matrices Zp, K e , and F were also defined in 
Section 3. 
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